AITopics | box size

Collaborating Authors

box size

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents

Zhou, Yuqi, Dai, Sunhao, Wang, Shuai, Zhou, Kaiwen, Jia, Qinglin, Xu, Jun

arXiv.org Artificial IntelligenceMay-23-2025

Recent Graphical User Interface (GUI) agents replicate the R1-Zero paradigm, coupling online Reinforcement Learning (RL) with explicit chain-of-thought reasoning prior to object grounding and thereby achieving substantial performance gains. In this paper, we first conduct extensive analysis experiments of three key components of that training pipeline: input design, output evaluation, and policy update-each revealing distinct challenges arising from blindly applying general-purpose RL without adapting to GUI grounding tasks. Input design: Current templates encourage the model to generate chain-of-thought reasoning, but longer chains unexpectedly lead to worse grounding performance. Output evaluation: Reward functions based on hit signals or box area allow models to exploit box size, leading to reward hacking and poor localization quality. Policy update: Online RL tends to overfit easy examples due to biases in length and sample difficulty, leading to under-optimization on harder cases. To address these issues, we propose three targeted solutions. First, we adopt a Fast Thinking Template that encourages direct answer generation, reducing excessive reasoning during training. Second, we incorporate a box size constraint into the reward function to mitigate reward hacking. Third, we revise the RL objective by adjusting length normalization and adding a difficulty-aware scaling factor, enabling better optimization on hard samples. Our GUI-G1-3B, trained on 17K public samples with Qwen2.5-VL-3B-Instruct, achieves 90.3% accuracy on ScreenSpot and 37.1% on ScreenSpot-Pro. This surpasses all prior models of similar size and even outperforms the larger UI-TARS-7B, establishing a new state-of-the-art in GUI agent grounding. The project repository is available at https://github.com/Yuqi-Zhou/GUI-G1.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.1581

Genre: Research Report (0.82)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
(3 more...)

Add feedback

A Visual-Analytical Approach for Automatic Detection of Cyclonic Events in Satellite Observations

Agrawal, Akash, Mohapatra, Mayesh, Raja, Abhinav, Tiwari, Paritosh, Pattanaik, Vishwajeet, Jaiswal, Neeru, Agarwal, Arpit, Rathore, Punit

arXiv.org Artificial IntelligenceSep-25-2024

Estimating the location and intensity of tropical cyclones holds crucial significance for predicting catastrophic weather events. In this study, we approach this task as a detection and regression challenge, specifically over the North Indian Ocean (NIO) region where best tracks location and wind speed information serve as the labels. The current process for cyclone detection and intensity estimation involves physics-based simulation studies which are time-consuming, only using image features will automate the process for significantly faster and more accurate predictions. While conventional methods typically necessitate substantial prior knowledge for training, we are exploring alternative approaches to enhance efficiency. This research aims to focus specifically on cyclone detection, intensity estimation and related aspects using only image input and data-driven approaches and will lead to faster inference time and automate the process as opposed to current NWP models being utilized at SAC. In context to algorithm development, a novel two stage detection and intensity estimation module is proposed. In the first level detection we try to localize the cyclone over an entire image as captured by INSAT3D over the NIO (North Indian Ocean). For the intensity estimation task, we propose a CNN-LSTM network, which works on the cyclone centered images, utilizing a ResNet-18 backbone, by which we are able to capture both temporal and spatial characteristics.

artificial intelligence, cyclone, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.08218

Country:

Asia > India > Karnataka > Bengaluru (0.05)
Asia > Myanmar (0.04)
Asia > India > Andhra Pradesh (0.04)
(8 more...)

Genre: Research Report > New Finding (0.48)

Industry: Energy (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adversarial Detection: Attacking Object Detection in Real Time

Wu, Han, Yunas, Syed, Rowlands, Sareh, Ruan, Wenjie, Wahlstrom, Johan

arXiv.org Artificial IntelligenceDec-12-2023

Intelligent robots rely on object detection models to perceive the environment. Following advances in deep learning security it has been revealed that object detection models are vulnerable to adversarial attacks. However, prior research primarily focuses on attacking static images or offline videos. Therefore, it is still unclear if such attacks could jeopardize real-world robotic applications in dynamic environments. This paper bridges this gap by presenting the first real-time online attack against object detection models. We devise three attacks that fabricate bounding boxes for nonexistent objects at desired locations. The attacks achieve a success rate of about 90% within about 20 iterations. The demo video is available at https://youtu.be/zJZ1aNlXsMU.

detection model, overlay, perturbation, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IV55152.2023.10186608

2209.01962

Country:

Europe > United Kingdom > England (0.04)
Asia > Nepal (0.04)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

One-Shot General Object Localization

You, Yang, Miao, Zhuochen, Xiong, Kai, Wang, Weiming, Lu, Cewu

arXiv.org Artificial IntelligenceNov-23-2022

This paper presents a general one-shot object localization algorithm called OneLoc. Current one-shot object localization or detection methods either rely on a slow exhaustive feature matching process or lack the ability to generalize to novel objects. In contrast, our proposed OneLoc algorithm efficiently finds the object center and bounding box size by a special voting scheme. To keep our method scale-invariant, only unit center offset directions and relative sizes are estimated. A novel dense equalized voting module is proposed to better locate small texture-less objects. Experiments show that the proposed method achieves state-of-the-art overall performance on two datasets: OnePose dataset and LINEMOD dataset. In addition, our method can also achieve one-shot multi-instance detection and non-rigid object localization. Code repository: https://github.com/qq456cvb/OneLoc.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2211.13392

Country:

Asia > China > Shanghai > Shanghai (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Sequential Community Mode Estimation

Jain, Shubham Anand, Goenka, Shreyas, Bapna, Divyam, Karamchandani, Nikhil, Nair, Jayakrishnan

arXiv.org Machine LearningNov-16-2021

We consider a population, partitioned into a set of communities, and study the problem of identifying the largest community within the population via sequential, random sampling of individuals. There are multiple sampling domains, referred to as \emph{boxes}, which also partition the population. Each box may consist of individuals of different communities, and each community may in turn be spread across multiple boxes. The learning agent can, at any time, sample (with replacement) a random individual from any chosen box; when this is done, the agent learns the community the sampled individual belongs to, and also whether or not this individual has been sampled before. The goal of the agent is to minimize the probability of mis-identifying the largest community in a \emph{fixed budget} setting, by optimizing both the sampling strategy as well as the decision rule. We propose and analyse novel algorithms for this problem, and also establish information theoretic lower bounds on the probability of error under any algorithm. In several cases of interest, the exponential decay rates of the probability of error under our algorithms are shown to be optimal up to constant factors. The proposed algorithms are further validated via simulations on real-world datasets.

algorithm, largest community, probability, (17 more...)

arXiv.org Machine Learning

2111.08535

Country:

North America > Canada (0.14)
South America > Brazil (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Compositional Generalization in Image Captioning

Nikolaus, Mitja, Abdou, Mostafa, Lamm, Matthew, Aralikatte, Rahul, Elliott, Desmond

arXiv.org Machine LearningSep-16-2019

Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image--sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models.

caption, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1909.04402

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

A Machine Learning Approach to Shipping Box Design

Yang, Guang, Mu, Cun

arXiv.org Machine LearningSep-29-2018

Having the right assortment of shipping boxes in the fulfillment warehouse to pack and ship customer's online orders is an indispensable and integral part of nowadays eCommerce business, as it will not only help maintain a profitable business but also create great experiences for customers. However, it is an extremely challenging operations task to strategically select the best combination of tens of box sizes from thousands of feasible ones to be responsible for hundreds of thousands of orders daily placed on millions of inventory products. In this paper, we present a machine learning approach to tackle the task by formulating the box design problem prescriptively as a generalized version of weighted k-medoids clustering problem, where the parameters are estimated through a variety of descriptive analytics. We test this machine learning approach on fulfillment data collected from Walmart U.S. eCommerce, and our approach is shown to be capable of improving the box utilization rate by more than 10%. Keywords: Shipping box design, k-medoids clustering, eCommerce, packaging science, operations research 1 Introduction The assortment of shipping boxes utilized by the fulfillment warehouse to pack and ship customer's online orders is a critical component of nowadays eCommerce business, as it will directly affect not only profit margins but also customer's experience.

artificial intelligence, box size, machine learning, (17 more...)

arXiv.org Machine Learning

1809.1021

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

Text Box Size, Skill, and Iterative Practice in a Writing Task

Raine, Roxanne Benoit (University of Memphis) | Mintz, Lisa (University of Memphis) | Crossley, Scott A. (Georgia State University) | Dai, Jianmin (University of Memphis) | McNamara, Danielle S. (University of Memphis)

AAAI ConferencesMay-18-2011

Although freewriting strategies are commonly taught in composition courses, there have been few empirical studies on freewriting. We address this gap by examining effects of prior writing skills (as measured by a pre-write essay), freewriting training, text-box size (1, 10, 20 lines), and repetitive writing on freewriting quality. Participants watched an agent-based vicarious learning freewriting instruction video or a control video including brief instructions on freewriting. After training, participants wrote six freewrites, two in each box size. Lesson delivery and text box size did not affect expert human ratings of the freewrites. Furthermore, participants did not benefit from writing successive freewrites regardless of their initial skill level. We describe how these results have been used to inform the design of Writing-Pal, an essay-writing intelligent tutoring system.

box size, freewrite, participant, (14 more...)

AAAI Conferences

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > New York (0.04)
North America > United States > Mississippi (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.54)
Education > Educational Setting > K-12 Education (0.46)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.34)

Add feedback